Every dawn of a new morn present investors and managers stock of decisions to make. These series of decicion to arrive at made the use of data science tools and resources of optimum requirement.
Hence, in this post, we will be performing Exploratory data analysis using a python library i just discovered: Autoviz. The data set in consideration is found here
from autoviz.AutoViz_Class import AutoViz_Class
AV = AutoViz_Class()
Imported v0.1.58. After importing, execute '%matplotlib inline' to display charts in Jupyter.
AV = AutoViz_Class()
dfte = AV.AutoViz(filename, sep=',', depVar='', dfte=None, header=0, verbose=1, lowess=False,
chart_format='svg',max_rows_analyzed=150000,max_cols_analyzed=30, save_plot_dir=None)
Update: verbose=0 displays charts in your local Jupyter notebook.
verbose=1 additionally provides EDA data cleaning suggestions. It also displays charts.
verbose=2 does not display charts but saves them in AutoViz_Plots folder in local machine.
chart_format='bokeh' displays charts in your local Jupyter notebook.
chart_format='server' displays charts in your browser: one tab for each chart type
chart_format='html' silently saves interactive HTML files in your local machine
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import datasets
import seaborn as sns
%matplotlib inline
data = pd.read_csv('SampleSuperstore.csv')
filename = 'SampleSuperstore.csv'
sep = ","
dft = AV.AutoViz(
filename,
sep=",",
depVar="Profit",
dfte=None,
header=0,
verbose=0,
lowess=False,
chart_format="svg",
max_rows_analyzed=150000,
max_cols_analyzed=50,
)
Shape of your Data Set loaded: (9994, 13)
#######################################################################################
######################## C L A S S I F Y I N G V A R I A B L E S ####################
#######################################################################################
Classifying variables in data set...
Number of Numeric Columns = 2
Number of Integer-Categorical Columns = 2
Number of String-Categorical Columns = 6
Number of Factor-Categorical Columns = 0
Number of String-Boolean Columns = 0
Number of Numeric-Boolean Columns = 0
Number of Discrete String Columns = 1
Number of NLP String Columns = 0
Number of Date Time Columns = 0
Number of ID Columns = 0
Number of Columns to Delete = 1
12 Predictors classified...
1 variables removed since they were ID or low-information variables
################ Regression problem #####################
Number of All Scatter Plots = 3
[nltk_data] Downloading collection 'popular' [nltk_data] | [nltk_data] | Downloading package cmudict to [nltk_data] | C:\Users\HP\AppData\Roaming\nltk_data... [nltk_data] | Package cmudict is already up-to-date! [nltk_data] | Downloading package gazetteers to [nltk_data] | C:\Users\HP\AppData\Roaming\nltk_data... [nltk_data] | Package gazetteers is already up-to-date! [nltk_data] | Downloading package genesis to [nltk_data] | C:\Users\HP\AppData\Roaming\nltk_data... [nltk_data] | Package genesis is already up-to-date! [nltk_data] | Downloading package gutenberg to [nltk_data] | C:\Users\HP\AppData\Roaming\nltk_data... [nltk_data] | Package gutenberg is already up-to-date! [nltk_data] | Downloading package inaugural to [nltk_data] | C:\Users\HP\AppData\Roaming\nltk_data... [nltk_data] | Package inaugural is already up-to-date! [nltk_data] | Downloading package movie_reviews to [nltk_data] | C:\Users\HP\AppData\Roaming\nltk_data... [nltk_data] | Package movie_reviews is already up-to-date! [nltk_data] | Downloading package names to [nltk_data] | C:\Users\HP\AppData\Roaming\nltk_data... [nltk_data] | Package names is already up-to-date! [nltk_data] | Downloading package shakespeare to [nltk_data] | C:\Users\HP\AppData\Roaming\nltk_data... [nltk_data] | Package shakespeare is already up-to-date! [nltk_data] | Downloading package stopwords to [nltk_data] | C:\Users\HP\AppData\Roaming\nltk_data... [nltk_data] | Package stopwords is already up-to-date! [nltk_data] | Downloading package treebank to [nltk_data] | C:\Users\HP\AppData\Roaming\nltk_data... [nltk_data] | Package treebank is already up-to-date! [nltk_data] | Downloading package twitter_samples to [nltk_data] | C:\Users\HP\AppData\Roaming\nltk_data... [nltk_data] | Package twitter_samples is already up-to-date! [nltk_data] | Downloading package omw to [nltk_data] | C:\Users\HP\AppData\Roaming\nltk_data... [nltk_data] | Package omw is already up-to-date! [nltk_data] | Downloading package omw-1.4 to [nltk_data] | C:\Users\HP\AppData\Roaming\nltk_data... [nltk_data] | Package omw-1.4 is already up-to-date! [nltk_data] | Downloading package wordnet to [nltk_data] | C:\Users\HP\AppData\Roaming\nltk_data... [nltk_data] | Package wordnet is already up-to-date! [nltk_data] | Downloading package wordnet2021 to [nltk_data] | C:\Users\HP\AppData\Roaming\nltk_data... [nltk_data] | Package wordnet2021 is already up-to-date! [nltk_data] | Downloading package wordnet31 to [nltk_data] | C:\Users\HP\AppData\Roaming\nltk_data... [nltk_data] | Package wordnet31 is already up-to-date! [nltk_data] | Downloading package wordnet_ic to [nltk_data] | C:\Users\HP\AppData\Roaming\nltk_data... [nltk_data] | Package wordnet_ic is already up-to-date! [nltk_data] | Downloading package words to [nltk_data] | C:\Users\HP\AppData\Roaming\nltk_data... [nltk_data] | Package words is already up-to-date! [nltk_data] | Downloading package maxent_ne_chunker to [nltk_data] | C:\Users\HP\AppData\Roaming\nltk_data... [nltk_data] | Package maxent_ne_chunker is already up-to-date! [nltk_data] | Downloading package punkt to [nltk_data] | C:\Users\HP\AppData\Roaming\nltk_data... [nltk_data] | Package punkt is already up-to-date! [nltk_data] | Downloading package snowball_data to [nltk_data] | C:\Users\HP\AppData\Roaming\nltk_data... [nltk_data] | Package snowball_data is already up-to-date! [nltk_data] | Downloading package averaged_perceptron_tagger to [nltk_data] | C:\Users\HP\AppData\Roaming\nltk_data... [nltk_data] | Package averaged_perceptron_tagger is already up- [nltk_data] | to-date! [nltk_data] | [nltk_data] Done downloading collection popular
All Plots done Time to run AutoViz = 45 seconds ###################### AUTO VISUALIZATION Completed ########################
dft